A Provenance-Integration Framework for Distributed Workflows in Grid Environments

نویسندگان

  • Jing Zhao
  • Fan Sun
  • Carlo Torniai
  • Amol Bakshi
  • Viktor Prasanna
چکیده

Provenance information about complex and distributed workflows is a key issue for data quality control and data reliability maintenance in reservoir management. Distributed and integrated environments where different workflows consume and transform data require a comprehensive provenance view. In this scenario provenance collection and integration presents significant challenges. In this paper, we categorize the provenance information into two kinds: the internal provenance which is provenance within workflow instances, and the external provenance which is provenance across workflows. We propose a provenance-integration framework for grid environments which can collect both internal and external provenance, and compose provenance collected from distributed workflows together to get an integrated view. Existing diverse internal provenance models and their corresponding provenance repositories are wrapped as web-services and integrated into our framework in a service-oriented architecture. A provenance index service containing published external provenance is used in the framework to connect internal provenance of multiple workflows by mapping their input/output data objects. The provenance index can also locate provenance requests of users to corresponding provenance services. A set of semantic models are defined in the provenance index service to express the external provenance and the provenance in-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provenance trails in the Wings/Pegasus system

Our research focuses on creating and executing large-scale scientific workflows that often involve thousands of computations over distributed, shared resources. We describe an approach to workflow creation and refinement that uses semantic representations to 1) describe complex scientific applications in a data-independent manner, 2) automatically generate workflows of computations for given da...

متن کامل

Automation of Network-Based Scientific Workflows

Comprehensive, end-to-end, data and workflow management solutions are needed to handle the increasing complexity of processes and data volumes associated with modern distributed scientific problem solving, such as ultra-scale simulations and high-throughput experiments. The key to the solution is an integrated network-based framework that is functional, dependable, fault-tolerant, and supports ...

متن کامل

On the Use of Abstract Workflows to Capture Scientific Process Provenance

Capturing provenance about artifacts produced by distributed scientific processes is a challenging task. For example, one approach to facilitate the execution of a scientific process in distributed environments is to break down the process into components and to create workflow specifications to orchestrate the execution of these components. However, capturing provenance in such an environment,...

متن کامل

Managing Provenance in Scientific Workflows with ProvManager

Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow systems. We have proposed a provenance gathering strategy that is independent from workflow system technology. This strategy has evolved into a provenance management system named ProvManager. The main principle is that each workflow ac...

متن کامل

An Identity Crisis in the Life Sciences

Grid is an e-Science project assisting life scientists to build workflows that gather and co-ordinate data from distributed, autonomous, replicated and heterogeneous resources. The provenance logs of workflow executions are recorded as RDF graphs. The log of one workflow run is used to trace the history of its execution process; however, by aggregating provenance logs of workflow reruns, or run...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008